Apollo: Learning Query Correlations for Predictive Caching in Geo-Distributed Systems
نویسندگان
چکیده
The performance of modern geo-distributed database applications is increasingly dependent on remote access latencies. Systems that cache query results to bring data closer to clients are gaining popularity, but they do not dynamically learn and exploit access patterns in client workloads. We present a novel prediction framework that identifies and makes use of workload characteristics obtained from data access patterns to exploit query relationships within an application’s database workload. We have designed and implemented this framework as Apollo, a system that learns query patterns and adaptively uses them to predict future queries and cache their results. Through extensive experimentation with two different benchmarks, we show that Apollo provides significant performance gains over popular caching solutions through reduced query response time. Our experiments demonstrate Apollo’s robustness and scalability as a predictive cache for geo-distributed database applications.
منابع مشابه
Caching in Multi-Agent based Architecture for Distributed Information Retrieval Systems
Caching is an e ective technique for improving performance in Databases and Information Retrieval (IR) systems. Traditional IR systems access the collection indices to perform searches. Such searches on large corpora for queries oft repeated can be computationally redundant. In addition, querying remote sources can be expensive because of large communication overheads and frequent inavailabilit...
متن کاملCache Investment Strategies
Emerging client-server and peer-to-peer distributed information systems employ data caching to improve performance and reduce the need for remote access to data. In distributed database systems, caching is a by-product of query operator placement | data that are brought to a site by a query operator can be retained at that site for future use. Operator placement, however, must take the location...
متن کاملImproving the Performance of SQL Join Operation in the Distributed Enterprise Information System by Caching
The enterprise information system (EIS) contains databases and other data sources in multiple data centers. Users query the EIS via clients. The client has a working space in the cloud. Caching data in client space will reduce the total execution time of the query. However, the client space has limited resources to store data. There are two options for caching data at the client space: caching ...
متن کاملQuery-Driven Indexing in Large-Scale Distributed Systems
Efficient and effective search in large-scale data repositories requires complex indexing solutions deployed on a large number of servers. Web search engines such as Google and Yahoo! already rely upon complex systems to be able to return relevant query results and keep processing times within the comfortable sub-second limit. Nevertheless, the exponential growth of the amount of content on the...
متن کاملEfficient Distributed Top-k Query Processing with Caching
Recently, there has been an increased interest in incorporating in database management systems rank-aware query operators, such as top-k queries, that allow users to retrieve only the most interesting data objects. In this paper, we propose a cache-based approach for efficiently supporting top-k queries in distributed database management systems. In large distributed systems, the query performa...
متن کامل